replace xml.NewEncoder with xml.EscapeText#2100
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2100 +/- ##
=======================================
Coverage 99.20% 99.20%
=======================================
Files 32 32
Lines 30096 30102 +6
=======================================
+ Hits 29858 29864 +6
Misses 158 158
Partials 80 80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
This change will cause xml:space="preserve" attribute of t element missing. The \n new line will doesn't work.
Before:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>After this PR change:
<c r="A1" s="1" t="inlineStr">
<is>
<t>text
</t>
</is>
</c>For example:
package main
import (
"fmt"
"github.com/xuri/excelize/v2"
)
func main() {
f := excelize.NewFile()
defer func() {
if err := f.Close(); err != nil {
fmt.Println(err)
}
}()
sw, err := f.NewStreamWriter("Sheet1")
if err != nil {
fmt.Println(err)
return
}
styleID, err := f.NewStyle(&excelize.Style{
Alignment: &excelize.Alignment{WrapText: true},
})
if err != nil {
fmt.Println(err)
return
}
if err := sw.SetRow("A1", []interface{}{excelize.Cell{Value: "text\n", StyleID: styleID}}); err != nil {
fmt.Println(err)
return
}
if err := sw.Flush(); err != nil {
fmt.Println(err)
return
}
if err = f.SaveAs("Book1.xlsx"); err != nil {
fmt.Println(err)
}
}This change will caused no-new line after A1 cell value text:
text
After this PR change:
text
So, I don't think we need to roll back the change 9999221.
|
@xuri Thanks, I got it! Then I do not see another way like copy this small method and make it work as we expect it. |
|
@xuri Or what if we check it before? Can you imagine some problem that can cause it? // trimCellValue provides a function to set string type to cell.
func trimCellValue(value string, escape bool) (v string, ns xml.Attr) {
if utf8.RuneCountInString(value) > TotalCellChars {
value = string([]rune(value)[:TotalCellChars])
}
if value != "" {
prefix, suffix := value[0], value[len(value)-1]
for _, ascii := range []byte{9, 10, 13, 32} {
if prefix == ascii || suffix == ascii {
ns = xml.Attr{
Name: xml.Name{Space: NameSpaceXML, Local: "space"},
Value: "preserve",
}
break
}
}
if escape {
var buf bytes.Buffer
_ = xml.EscapeText(&buf, []byte(value))
value = buf.String()
}
}
v = bstrMarshal(value)
return
}And we have this one |
xuri
left a comment
There was a problem hiding this comment.
Yeah, your lasted change will escape \n in different way:
Before:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>After this PR change:
<c r="A1" s="1" t="inlineStr">
<is>
<t xml:space="preserve">text
</t>
</is>
</c>This change will caused no-new line after A1 cell value text in Windows Office 2007, but works on Windows Office 2010, Excel for Mac.
|
What about others? I think we also have a problem with those symbols because we will replace them with: |
|
The The Therefore, I suggest maintaining the current |
xuri
left a comment
There was a problem hiding this comment.
Thanks for your PR. Any benchmark data on the performance impact of using xml.EscapeText instead of xml.NewEncoder? Specifically, how much memory is saved, and what percentage of speed improvement can be expected? I don't recommend copying code from the standard library; if necessary, it would be better to submit a patch to improve the Go standard library directly.
|
Hi! The file contains around 80k lines
Speed is the same because the function under the hood is familiar, but the It is around 119 times bigger than was. I would say too much :) I think we need to find the best solution here. Extending the original lib you will see as the best solution? |
|
As an alternative solution, it can be something like that, but here we need to use global var if it is ok, we go with this as well. If we need concurrency we can add the mutex. package excelize
import (
"bytes"
"encoding/xml"
)
var xmlEncoder = newEncoder()
type encoder struct {
*xml.Encoder
buf bytes.Buffer
}
func newEncoder() *encoder {
e := new(encoder)
e.Encoder = xml.NewEncoder(&e.buf)
return e
}
func (x *encoder) encode(str string) string {
if str == "" {
return ""
}
_ = x.EncodeToken(xml.CharData(str))
_ = x.Flush()
defer x.buf.Reset()
return x.buf.String()
} |
|
Hi @artur-chopikian, I think using following changes would be better approach: -var buf bytes.Buffer
-enc := xml.NewEncoder(&buf)
-_ = enc.EncodeToken(xml.CharData(value))
-enc.Flush()
-value = buf.String()
+var buf strings.Builder
+_ = xml.EscapeText(&buf, []byte(value))
+value = strings.ReplaceAll(buf.String(), "
", "\n") |
There was a problem hiding this comment.
Hi @artur-chopikian, please using strings.Builder instead of bytes.Buffer to get better performance.
xuri
left a comment
There was a problem hiding this comment.
LGTM, thanks for your contribution.
PR Details
Memory allocations
Description
xml.NewEncoderusesbufio.NewWriter, which allocates 4096 bytes to every call (every sell with text in the xlsx, you can imagine how much it can be).And this
xml.EscapeTextshows new lines properly in the xlsx file.Types of changes
Checklist